Skip to content

[PARKED — v0.3.2+ track] Mac VZ branch refresh against v0.3.0 (rebase complete, not for v0.3.1 merge)#3

Draft
SashaMIT wants to merge 22 commits into
mainfrom
sash/local-test-v030
Draft

[PARKED — v0.3.2+ track] Mac VZ branch refresh against v0.3.0 (rebase complete, not for v0.3.1 merge)#3
SashaMIT wants to merge 22 commits into
mainfrom
sash/local-test-v030

Conversation

@SashaMIT

@SashaMIT SashaMIT commented May 28, 2026

Copy link
Copy Markdown

STATUS — PARKED (v0.3.2+ track), per Anders review 2026-05-29. This PR is not a v0.3.1 merge candidate. macOS stays browser-hosted for v0.3.1; native Mac VZ / Linux-microVM parity is gated behind real Apple Silicon hardware operator-testing. Held as a parallel branch and folded in no earlier than v0.3.2. Decisions recorded in docs/mac-vz/v030-rebase/DECISIONS.md.

Merge acceptance bar (all required before un-parking): rebase onto current main · Linux CI green · Mac CI green or explicitly justified · real Apple Silicon signed-build smoke · hunk-level review of carrier_bridge.rs / supervisor.rs / vm_provider.rs · no regression to Linux / Home / Browser / Carrier paths.

Post-rebase follow-up day plan (executing against the decisions): D1 PR reframe + scoping-parity verify · D2 SUN_LEN Darwin socket-path fix · D3 principal-scoping parity test · D4 components.json hygiene + provider_call audit + codesigning doc.


Draft / review-ready. Four-day rebase complete. Linux CI green, Mac VZ CI down to 6 known v0.3.0-on-Mac SUN_LEN failures (not rebase regressions). The matching CVE work is in #2 (also draft, ready when you are). The pre-rebase Mac VZ branch is preserved at the tag archive/local-test-pre-v030-rebase.

Why this PR exists

After v0.3.0 shipped, the existing Mac VZ branch (sash/local-test) was rooted on v0.2.0 main — same staleness problem we just solved for the CVE work. Same fix: rebase onto v0.3.0 in deliberate, daily increments so when you're ready to look at the Mac branch, it's against current main rather than against the version you just released.

Day-by-day landed

Day What landed HEAD
1 Baseline only — branch identical to PR #2 HEAD, all 3 CI lanes green, zero Mac VZ work yet. Notes: docs/mac-vz/v030-rebase/DAY_1.md 06b91ce
2 Conflict-free Mac VZ pieces: full elastos-vz crate (26 files), elastos-crosvm cfg-gating, setup.rs (Mac platform detection), test infrastructure. Notes: docs/mac-vz/v030-rebase/DAY_2.md 0338996
3 The big three-way merge: supervisor.rs + vm_provider.rs + carrier_bridge.rs reconciled, plus doctor_cmd.rs, vm_debug_cmd.rs, home_cmd.rs, run_cmd.rs, main.rs, all v0.3.0-dependent tests re-staged. Notes: docs/mac-vz/v030-rebase/DAY_3.md f97203f
4 Sign-off: carrier_bridge.rs v0.3.0 carrier_invoke ABI + principal-aware logic merge (Day 3 had inverted the layering — now fixed), fuzz harness restored from archive, linux-untouched.yml gate re-baselined to Day-4 HEAD, workspace walk-through confirmed no other Mac VZ surface lost. Notes: docs/mac-vz/v030-rebase/DAY_4.md 401a520

Day 4 final-state validation

```text
$ cargo check --workspace --tests # clean
$ cargo clippy --workspace --tests -- -D warnings # clean
$ cargo fmt --all -- --check # silent (clean)

$ cargo test -p elastos-server --lib carrier_bridge:: # 24/24 pass
$ cargo test -p elastos-server --lib supervisor:: # 60/60 pass
$ cargo test -p elastos-vz --lib # 108/108 pass
$ cargo test -p elastos-server --lib # 646 / 6 / 2 (pass / fail / ignored)
```

The 6 failures are all in gateway_browser_route_tests::test_browser_* and all fail with path must be shorter than SUN_LEN — v0.3.0's gateway_browser_stream path-construction exceeds Darwin's 104-byte sun_path limit. Pre-existing v0.3.0-on-Mac platform issue, not a rebase regression. Concrete diagnosis in DAY_3.md § "Concrete root cause".

CI lanes

Check Day 4 final Notes
Linux-untouched gate (Vz backend) ✅ green Re-baselined to Day-4 HEAD 65f5f05. Day 3+4 commits only touch elastos-server (NOT protected).
CI (Linux build + tests) ✅ green Full Linux test suite passes.
Mac Vz CI (Phase 5+ Apple Silicon) ❌ 6 SUN_LEN failures Same set as Day 3, all v0.3.0-on-Mac platform issues. No rebase regressions.

Known issues carried forward (NOT rebase regressions)

  1. 6 Mac CI gateway_browser SUN_LEN failures — concrete root cause in DAY_3.md. Project-level decision pending.
  2. components.json lacks darwin-arm64 — v0.3.0 main removed Mac platform declarations. Capsule release-pipeline gap.
  3. Two elastos-vz real-kernel boot tests require the Apple com.apple.security.virtualization entitlement (codesigning).

Anders handoff message

A paste-ready three-paragraph summary lives at docs/vz-backend/V030_MESSAGE_DRAFT.md. Refreshed for Day 4 sign-off — replaces the pre-rebase "two real conflicts pending" framing with "rebase complete, here's the diff shape, no action requested."

Made with Cursor

SashaMIT and others added 20 commits May 28, 2026 12:05
…5→21 vulns)

Rebases the CVE-hygiene work onto v0.3.0 main (commit 8acb72d). Day 1 focuses
on dependency-side fixes only — no code edits, no API migrations. All changes
are confined to Cargo.toml / Cargo.lock.

Result on Linux build target:
  cargo audit baseline (v0.3.0):  35 vulnerabilities / 12 warnings
  cargo audit after Day 1:        21 vulnerabilities /  7 warnings
  delta:                          14 CVEs closed, 5 warnings closed

What changed:

  elastos-storage/Cargo.toml
    lru 0.12 → 0.18 (closes RUSTSEC-XXXX unmaintained on lru 0.12.x)

  elastos-server/Cargo.toml
    + distributed-topic-tracker = "=0.2.7"
        0.2.8 silently bumped iroh ^0.96 → ^0.97; would clash with our
        direct iroh 0.96 pin. Same constraint as the v0.2.0 CVE branch
        (PR #1 commit d32cc3a) and documented inline.
    + ed25519 = "=3.0.0-rc.4"
    + pkcs8 = "=0.11.0-rc.11"
    + signature = "=3.0.0-rc.10"
        distributed-topic-tracker pulls in ed25519-dalek 3.0.0-pre.1, which
        was published against pre-release versions of three RustCrypto crates.
        The released ed25519 3.0.0 / pkcs8 0.11.0 / signature 3.0.0 have
        API-incompatible changes (pkcs8::Error variants moved). cargo update
        otherwise rolls them forward and breaks ed25519-dalek inside the
        registry source. Holding all three at the rc versions v0.3.0 shipped
        keeps the graph buildable until ed25519-dalek 3.x-stable releases.
    - crossterm = "0.28"
    - ratatui = "0.29"
        Dead deps in elastos-server src (verified via grep — only used in
        capsules/chat under its own `tui` feature gate, unaffected here).
        Removing them collapsed the lru 0.12.5 / atomic-polyfill / paste
        transitive chain that surfaced as unmaintained warnings.

  Cargo.lock
    Refreshed via `cargo update` (plain, NOT --aggressive — the rc-chain
    pins make aggressive too dangerous; see comment block in elastos-server
    Cargo.toml for the full rationale).

What's left for Day 2+:

  18 wasmtime CVEs  → Day 2: wasmtime 17 → 36 bump (cap-primitives transitive
                       also closes automatically, that's −19 total)
   2 hickory-proto  → DEFERRED (RUSTSEC-2026-0118/0119 require hickory 0.26
                       which is still pre-release; documented disposition
                       matches the old v0.2.0 branch's audit decisions)
   1 rsa            → NEW in v0.3.0 from elastos-auth (RUSTSEC-2023-0071,
                       Marvin attack). Flagged to Anders in V030_MESSAGE_DRAFT.md
                       for project-level decision before patching.

Verification: - cargo check --workspace --exclude elastos-crosvm --exclude elastos-server
      --exclude elastos-runtime → green on macOS host
  - elastos-crosvm fails to build on macOS due to pre-existing Linux-only
    libc bindings in v0.3.0 (sockaddr_in.sin_len missing, ioctl signature
    mismatch). Not a Day 1 regression — same error reproduces on plain
    `git checkout origin/main && cargo check`. Linux CI will verify.
  - cargo fmt --all --check: clean
  - Branch is pushed under `chore/runtime-cve-hygiene-v030` (parallel to the
    existing PR #1 v0.2.0 branch; swap proposed on Day 4).
Co-authored-by: Cursor <cursoragent@cursor.com>
Concrete metrics, the why behind each pin (especially the rc-chain),
remaining vuln breakdown by cluster, and the Day 2–4 plan. Lives on
the rebase branch so anyone looking at the diff finds the rationale.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ulns)

Closes 18 vulnerabilities in one bump: the entire wasmtime cluster
(RUSTSEC-2026-0020/-0021/-0085/-0086/-0087/-0088/-0089/-0091/-0092/-0093
/-0094/-0095/-0096/-0149, -2025-0046/-0118, -2024-0438) plus the
cap-primitives transitive (RUSTSEC-2024-0445).

Cumulative result on Linux build target:
  cargo audit on v0.3.0 baseline:  35 vulnerabilities / 12 warnings
  cargo audit after Day 1:         21 vulnerabilities /  7 warnings
  cargo audit after Day 2:          3 vulnerabilities /  5 warnings
  Day 2 delta:                     -18 vulnerabilities / -2 warnings

The 3 remaining vulnerabilities are out of scope for this branch:
  - 2 hickory-proto CVEs (RUSTSEC-2026-0118/-0119) — DEFERRED, require
    hickory 0.26 which is still pre-release. Same disposition as the
    v0.2.0 CVE branch and documented there.
  - 1 rsa Marvin attack (RUSTSEC-2023-0071) — NEW in v0.3.0 via
    elastos-auth / wallet-provider. Flagged to Anders in
    V030_MESSAGE_DRAFT.md for project-level decision before patching.

==== What changed ====

elastos-compute/Cargo.toml
  wasmtime    17 → 36 (current stable line)
  wasmtime-wasi 17 → 36
  wasi-common: REMOVED (no longer pulled in directly; preview1 host
               functions now come from wasmtime_wasi::preview1)
  Inline comment block documents the migration rationale and the new
  FIFO carrier transport.

elastos-compute/src/providers/wasm.rs (port of archive/runtime-cve-hygiene-v020-base + v0.3.0 principal binding)
  WASI API migration:
    wasi_common::pipe::{ReadPipe, WritePipe}     → REMOVED
    wasmtime_wasi::sync::WasiCtxBuilder          → wasmtime_wasi::WasiCtxBuilder
    wasmtime_wasi::WasiCtx                       → wasmtime_wasi::preview1::WasiP1Ctx
    + use wasmtime_wasi::{DirPerms, FilePerms}
    + use wasmtime_wasi::preview1::{self, WasiP1Ctx}
  Carrier bridge transport rewrite:
    OLD: fd-injection via WasiCtx::insert_file (FD 3/4)
         REMOVED upstream in wasmtime-wasi 24+ — no workaround
    NEW: per-launch carrier dir + two FIFOs (request, response) under
         /tmp/elastos-carrier/<capsule-id>/, preopened into the WASI
         sandbox at /_carrier via DirPerms::READ + FilePerms::READ|WRITE.
         New helpers: carrier_dir_for(), setup_carrier_fifos(),
         cleanup_carrier_dir(), mkfifo(), CARRIER_GUEST_DIR constant,
         BridgeTransport enum (currently single variant Fifos), Drop
         impl on RunningInstance to GC the carrier dir.
  Type-complexity fix preserved from PR #1 commit 46e3edd:
    WasiContextWithBridge type alias for build_wasi_context return.
  Stricter entrypoint policy (v0.3.0 change ported on top):
    Missing _start now returns an ElastosError::Compute instead of
    silently falling back to main() or emitting a warning.

  v0.3.0's principal-binding work, layered back on top of the archive port:
    + BridgePipes.capsule_id: String
    + BridgePipes.principal_id: Option<String>
    + WasmProvider.bridge_principals (Arc<RwLock<HashMap<...>>>)
    + WasmProvider::set_bridge_principal() / clear_bridge_principal()
    + build_wasi_context() now takes principal_id: Option<&str> and
      threads it through setup_carrier_fifos() into the returned pipes.
    + Both test manifests get authority: None (v0.3.0 added the field).

  Tests included (all from the archive branch's coverage):
    test_default_bridge_transport_is_fifos
    test_setup_carrier_fifos_creates_dir_and_fifos_with_correct_modes
    test_cleanup_carrier_dir_is_idempotent
    test_setup_carrier_fifos_round_trip_via_host_ends
    test_mkfifo_returns_typed_error_on_invalid_path
    test_running_instance_drop_cleans_carrier_dir

==== Reconciliation notes ====

The v0.3.0 main delta on wasm.rs (+27/-12) was small and focused on
principal binding. The CVE branch delta (+500/-113) was the wasmtime
migration plus FIFO transport. They touched mostly disjoint regions, so
the port was a copy-archive-then-overlay-v0.3.0 operation:

  1. Copy elastos-compute/src/providers/wasm.rs from
     archive/runtime-cve-hygiene-v020-base (the known-working post-CVE
     end-state).
  2. Add v0.3.0's two new fields to BridgePipes.
  3. Add v0.3.0's bridge_principals field + setter/clearer to WasmProvider.
  4. Thread principal_id through setup_carrier_fifos and build_wasi_context.
  5. Apply v0.3.0's stricter _start policy and the TODO-comment cleanup.
  6. Add authority: None to both test manifests.

Carrier_bridge.rs was NOT touched on Day 2. Its only API contact with
wasm.rs is BridgePipes { capsule_stdout, capsule_stdin } — neither of
which moved. The two new BridgePipes fields (capsule_id, principal_id)
are passive additions that consumers can read but aren't required to.
Day 3 will handle any carrier_bridge.rs reconciliation needed for the
axum-server 0.7 → 0.8 / rustls-pemfile migration.

==== Verification ====

  cargo check -p elastos-compute (Mac):              green
  cargo check --workspace --exclude crosvm/server/runtime (Mac): green
  cargo clippy --workspace --exclude crosvm/server/runtime/guest
    --all-targets -- -D warnings (Mac):              green
  cargo fmt --all --check:                           clean
  Linux CI on PR #2:                                 pushed for verification

The elastos-crosvm / elastos-guest Mac compile failures are pre-existing
in v0.3.0 (sockaddr_in.sin_len, ioctl signature, openpty winsize
pointer) and reproduce on plain origin/main. Not caused by Day 2.

Co-authored-by: Cursor <cursoragent@cursor.com>
Records the wasmtime 17→36 migration + FIFO transport reconciliation,
including the three-way diff that informed the copy-then-overlay
strategy, the API-surface preservation evidence, and the cluster-by-
cluster CVE closure ledger.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ulns / 4 warnings)

Closes the rustls-pemfile unmaintained warning (RUSTSEC-2025-0134) by
bumping axum-server to 0.8 (which itself dropped the dep) and migrating
the elastos-tls cert/key loader from `rustls_pemfile::{certs,
pkcs8_private_keys}` to the in-tree `rustls_pki_types::pem::PemObject`
API. Wire format is identical PEM; the returned types
(`CertificateDer<'static>`, `PrivatePkcs8KeyDer<'static>`) are the same
ones we were already constructing manually one line below.

Cumulative result on Linux build target:
  cargo audit on v0.3.0 baseline:  35 vulnerabilities / 12 warnings
  cargo audit after Day 1:         21 vulnerabilities /  7 warnings
  cargo audit after Day 2:          3 vulnerabilities /  5 warnings
  cargo audit after Day 3:          3 vulnerabilities /  4 warnings
  Day 3 delta:                     -1 warning (rustls-pemfile)

The 3 remaining vulnerabilities are unchanged from Day 2 and are all
out of scope for this branch (2 hickory deferred upstream, 1 rsa Marvin
flagged to Anders).

==== What changed ====

elastos-tls/Cargo.toml
  axum-server      0.7 → 0.8 (drops rustls-pemfile transitive)
  rustls-pemfile   "2.0"     → REMOVED (closes RUSTSEC-2025-0134)
  Inline comment documents the migration target.

elastos-tls/src/lib.rs (start_tls_proxy cert/key loader)
  - use rustls_pemfile::{certs, pkcs8_private_keys};
  - use std::io::BufReader;
  + use tokio_rustls::rustls::pki_types::{pem::PemObject, CertificateDer, PrivatePkcs8KeyDer};

  - let cert_file = fs::File::open(cert_path)?;
  - let key_file = fs::File::open(key_path)?;
  - let certs = certs(&mut BufReader::new(cert_file)).collect::<Result<Vec<_>, _>>()?;
  - let keys = pkcs8_private_keys(&mut BufReader::new(key_file)).collect::<Result<Vec<_>, _>>()?;
  - let key = keys.into_iter().next().ok_or_else(|| anyhow::anyhow!("No private key found"))?;
  + let certs: Vec<CertificateDer<'static>> = CertificateDer::pem_file_iter(cert_path)?
  +     .collect::<Result<Vec<_>, _>>()?;
  + let key: PrivatePkcs8KeyDer<'static> = PrivatePkcs8KeyDer::from_pem_file(key_path)?;

  Error messages updated to include path context (small UX
  improvement that came naturally with the rewrite).

elastos-server/Cargo.toml
  axum-server  0.7 → 0.8 (no source changes needed; consumers are
  wire-compatible — verified by grep'ing axum_server uses in
  server_infra.rs and api/server.rs and observing that the archive
  branch's diff against those files is empty).

==== Reconciliation notes ====

elastos-server/src/api/server.rs and elastos-server/src/server_infra.rs
both `use axum_server` directly but at the call surfaces (Server::bind,
TlsConfig, etc.) the 0.7 → 0.8 migration is wire-compatible. Verified
two ways:
  1. Diff of $(merge_base v0.3.0 archive) vs archive on those files
     returns empty — the original v0.2.0 → archive transition didn't
     need source edits there.
  2. cargo check --workspace --exclude crosvm/server/runtime on Mac
     compiles clean post-bump (elastos-tls is what actually pulled
     axum-server 0.8 into our build).

The Day 1 rc-chain pins (ed25519, pkcs8, signature) all held through
the cargo update — verified post-bump:
  ed25519   3.0.0-rc.4    ✓ (pin held)
  pkcs8     0.11.0-rc.11  ✓ (pin held)
  signature 3.0.0-rc.10   ✓ (pin held; not shown but present)

==== Verification ====

  cargo check -p elastos-tls (Mac):                green
  cargo check --workspace --exclude crosvm/server/runtime (Mac): green
  cargo clippy --workspace --exclude crosvm/server/runtime/guest
    --all-targets -- -D warnings (Mac):            green
  cargo fmt --all --check:                         clean
  cargo audit:                                     3 vulns / 4 warnings
  Linux CI on PR #2:                               pushed for verification

Co-authored-by: Cursor <cursoragent@cursor.com>
Day 3 records the axum-server 0.7→0.8 + rustls-pemfile removal that
brings the warning count to 4 (down from 12 baseline). All 4 remaining
warnings are wasmtime transitives or deferred bincode work — none
directly fixable on this branch.

Includes the rationale for the build-cache-save 16m timing (not a
regression: cargo build itself finished in ~7m), the verification
that v0.3.0 didn't drift on elastos-tls/src or elastos-server/src
axum-server callers, and the Day 4 ship-it preview.

Co-authored-by: Cursor <cursoragent@cursor.com>
…eview

Top-level reviewer's doc covering:
  - Headline result (35→3 vulns / 12→4 warnings, 91% / 67% reductions)
  - Why this PR exists (v0.3.0 rebase rationale; CVE work isolated
    from the open Mac VZ product question)
  - Full 4-day timeline with per-day commit + Δ metrics, linked to
    the existing DAY_1.md / DAY_2.md / DAY_3.md notes
  - CVE closure ledger by RUSTSEC ID — closed (32) and accepted (3+4)
  - Cargo.toml + Rust source change summary
  - Reviewer's checklist (read-in-order guide)
  - Final verification record (audit / fmt / clippy / test / CI history)
  - Handoff section with the three project-level decisions Anders
    might want to weigh in on (rsa disposition, bincode migration
    scheduling, Mac VZ direction)

No code changes today — Day 4 is the ship-it day. The next steps
(flip PR #2 from Draft → Ready, close PR #1 with explanatory comment)
are handled outside the commit history via gh CLI.

Co-authored-by: Cursor <cursoragent@cursor.com>
Day 1 of the Mac VZ rebase onto v0.3.0 is a deliberate no-op for code:
the new branch is identical to PR #2 HEAD (CVE work + v0.3.0 main).
Mac VZ work starts landing on Day 2.

Includes the conflict map for Days 2–4 (which files conflict and how
hard), the rationale for branching off PR #2 instead of main directly
(avoids audit regression), and the recovery plan if Day 4's
carrier_bridge.rs reconciliation doesn't converge cleanly.

PR #3 is open as DRAFT / DO NOT REVIEW to surface Linux + Mac VZ CI
during the rebase. Will flip to ready on Day 4.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Day 2 of 4 in the Mac VZ rebase onto v0.3.0 main. Lands the entire
non-conflicting surface of the Mac VZ branch — everything that does
not require reconciling supervisor.rs (Day 3) or carrier_bridge.rs
(Day 4) on top of v0.3.0.

What landed
-----------
* Entire new `elastos-vz` crate (Vz/Apple Virtualization.framework
  substrate). Cleanly compiles on Mac, stubs out on Linux. Registered
  as a workspace member; `elastos-server` picks it up via a
  `[target.'cfg(target_os = "macos")'.dependencies]` block so Linux
  builds are byte-identical.
* `elastos-crosvm` cfg-gating (lib.rs, rootfs.rs, network_stub.rs):
  the crate now compiles cleanly on macOS for the first time, which
  in turn means `cargo check -p elastos-server` succeeds locally on
  Mac without our CVE-rebase-era exclusion list.
* `binaries.rs`: layered v0.3.0's env-var override on top of Mac VZ's
  `ELASTOS_DATA_DIR`-respecting global fallback. Both behaviors retained.
* New ancillary modules with no v0.3.0 conflict: `overlay_initrd.rs`
  (CPIO overlay-init builder used by Mac VZ's substrate boot).
* `security_cmd.rs`, `sources.rs`: small Mac-only changes, no v0.3.0
  conflict.
* CI: `mac-vz.yml`, `release-mac.yml`, `_self-hosted-probe.yml`,
  `linux-untouched.yml` workflows; `ci.yml` trigger update for
  `sash/**` and `vz/**` branches.
* Scripts: `scripts/lib/cross-platform*.sh`, `runtime-cleanup*.sh`,
  `components-json-verify.sh`, `release/*`, `dev/*`,
  `measure-{vz,crosvm}-baseline.sh`, `release-mac.sh`. Several
  existing smoke scripts (`chat-wasm-native-interop-smoke.sh`,
  `home-frontdoor-smoke.sh`, `local-carrier-setup-smoke.sh`,
  `lib/runtime-cleanup.sh`) extended for Mac VZ — v0.3.0 didn't touch
  them so the merge is clean.
* Docs: full `docs/vz-backend/` tree including phase notes, plan,
  threat model, perf baseline, etc. Plus `docs/MAC.md`,
  `docs/ELASTOS_PRD.md`. `state.md` updated for Mac substrate scope.
* Test fixtures inside elastos-vz now include `authority: None` to
  match the v0.3.0 `CapsuleManifest` schema (same fix as PR #2's
  wasm.rs reconciliation).

What is deliberately deferred
-----------------------------
* Day 3: `supervisor.rs` (+148/-88 v0.3.0 vs +3800/-81 Mac VZ),
  `setup.rs`, `vm_provider.rs`, `runtime_control.rs`, `home_cmd.rs`,
  `run_cmd.rs`, `main.rs`, `doctor_cmd.rs`, `vm_debug_cmd.rs`,
  `elastos-guest/runtime.rs`, plus all the new tests under
  `elastos-server/tests/` that depend on Supervisor APIs.
* Day 4: `carrier_bridge.rs`, `runtime.rs` (BridgeContext.on_terminate),
  fuzz target (depends on carrier_bridge framing API),
  `vz_shutdown_semantics.rs`, `vz_chat_interop_smoke.rs`,
  `vz_home_frontdoor_smoke.rs` (all depend on Day 4 BridgeContext).

Verification
------------
Local Mac (`cargo clippy --workspace --exclude elastos-guest --tests
-- -D warnings`) — green (elastos-guest exclusion is the same
preexisting v0.3.0 cross-OS bug we documented in the CVE rebase
SIGNOFF). `cargo fmt --all -- --check` — clean. `cargo audit` — 3
vulns / 4 warnings, identical to PR #2 (no regression). Linux + Mac
VZ CI signal will land via the draft PR #3 push.

Co-authored-by: Cursor <cursoragent@cursor.com>
…d1333

Day 2 surfaced two CI gaps:

1. `linux-untouched.yml` referenced `scripts/check-linux-untouched.sh`
   which I forgot to port from the archive. Pulled it in as-is.

2. The gate's hardcoded baseline `a65dad3` (the original Phase 0
   sash/local-test commit) is unreachable from the rebased branch,
   because the new branch is rooted on PR #2 / v0.3.0 main rather
   than on sash/local-test. Re-baselined to `ded1333` (Day 2 HEAD,
   the first Mac-VZ-rebase commit), so the gate now enforces "no
   Mac-VZ-rebase work modifies elastos-crosvm / elastos-runtime /
   elastos-common / elastos-compute beyond what Day 2 already
   shipped." Day 3+ only touches elastos-server (not in the
   protected list) so this remains a meaningful guardrail.

Inline comment in the workflow documents the rationale and points
back to docs/mac-vz/v030-rebase/DAY_2.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
v0.3.0 introduced (or kept) `std::ptr::null()` for the optional
`termios` / `winsize` args in the openpty test. Apple's libc binding
declares those as `*mut`, so the cast fails on Mac runners with
E0308 (`expected raw pointer *mut, found *const _`). Linux libc
accepts either; `null_mut()` is portable to both.

The original sash/local-test branch already had this exact fix at
the v0.2.0-shaped equivalent of the same test; lifting it forward
onto the v0.3.0 file unblocks the Mac VZ workflow's Mac runner job.

`elastos-guest` is intentionally outside `linux-untouched.yml`'s
protected-paths list (which only covers `elastos-crosvm`,
`elastos-runtime`, `elastos-common`, `elastos-compute`), so this
change does not violate the rebase gate.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The Day 2 commit (ded1333) staged its files via `git checkout
archive -- <path>` (which adds to the index) plus a few
in-place edits via the editor's StrReplace tool (which only
modifies the working tree, not the index). The StrReplace edits
were never staged before `git commit` ran, so they were silently
dropped from `ded1333`. This left the branch in a half-ported
state — the new files were committed but the workspace-level
glue that wires them in was missing.

Mac CI surfaced this immediately: `cargo test -p elastos-vz` fails
with `package ID specification 'elastos-vz' did not match any
packages` because the workspace `members` list didn't actually have
the entry. Linux CI passed by coincidence (it doesn't try to build
the macOS-only target dep, and `cargo clippy --workspace` succeeds
without elastos-vz being in the workspace because it just builds
whatever IS listed).

What this commit adds — the actual content of the Day 2 overlays:

* elastos/Cargo.toml: register `crates/elastos-vz` as a workspace member
* elastos/crates/elastos-server/Cargo.toml: add macOS-only target dep on `elastos-vz`
* elastos/crates/elastos-server/src/lib.rs: add `pub mod overlay_initrd`
* elastos/crates/elastos-server/src/binaries.rs: layer Mac's `ELASTOS_DATA_DIR`
  guard around v0.3.0's existing global-fallback block (both behaviors retained)
* elastos/crates/elastos-vz/src/{config,provider,vm}.rs: add `authority: None`
  to in-module test fixtures (v0.3.0's `CapsuleManifest` schema)
* elastos/crates/elastos-vz/tests/{concurrent_launch,smoke}.rs: same fix for
  integration test fixtures
* elastos/Cargo.lock: regenerated to register elastos-vz

After this commit, the branch reflects what the Day 2 commit message
already claimed it landed. No new design decisions, just resync.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ailures on Mac, not regressions

Co-authored-by: Cursor <cursoragent@cursor.com>
…ge reconciled

Land the largest single Mac VZ rebase phase on top of v0.3.0 main:

* `supervisor.rs`, `vm_provider.rs`, `carrier_bridge.rs` reconciled —
  Mac VZ's Mac-substrate work + v0.3.0's surgical changes (principal_id
  on LaunchCapsule, content/fetch contract, read-timeout split, BridgeContext
  principal_id/data_dir fields).
* `setup.rs`, `home_cmd.rs`, `run_cmd.rs`, `main.rs` 3-way merged.
* `runtime_control.rs` + `runtime.rs` Mac portability tweaks layered onto
  v0.3.0 base.
* New Mac-only command surfaces ported: `doctor_cmd.rs`, `vm_debug_cmd.rs`,
  with `pub mod doctor_cmd` exposed via `lib.rs`.
* 8 Mac VZ test files ported: capability_concurrency, vz_chat_interop_smoke,
  vz_home_frontdoor_smoke, vz_perf_harness, vz_shutdown_semantics,
  vz_supervisor_smoke, vz_supervisor_startup_orphan_cleanup,
  tests/common/mod.rs. v0.3.0 schema deltas applied at call sites
  (CapsuleManifest.authority, BridgeContext.{principal_id,data_dir},
  SupervisorRequest::LaunchCapsule.principal_id).

Local validation:
- `cargo check --workspace --tests` clean
- `cargo clippy --workspace --tests -- -D warnings` clean
- `cargo fmt --all -- --check` clean
- 60 supervisor unit tests pass on Mac, including the new
  test_launch_capsule_rejects_principal_for_provider_role and the
  renamed test_content_fetch_via_provider_uses_content_contract.

Deferred to Day 4: full v0.3.0 principal-aware logic in carrier_bridge.rs
(scope_current_user_alias, principal_root_read_write_uri,
protected_principal_root_carrier_response, rooted_localhost_fs_path).
The fields are wired through; the read/write paths still operate in
non-principal-aware mode until Day 4 finishes the merge.

The 6 pre-existing v0.3.0 Mac cross-OS test failures in
gateway_browser_route_tests remain — same class of issue as the 14 we
documented on Day 2. Out of scope for the rebase; flagged for Anders.

See docs/mac-vz/v030-rebase/DAY_3.md for the full per-file 3-way merge
notes.

Co-authored-by: Cursor <cursoragent@cursor.com>
Day 3 left the same 6 gateway_browser_route_tests failing on Mac CI
(green on Linux). I dug locally and confirmed they all return HTTP
500 with body "path must be shorter than SUN_LEN" — i.e. v0.3.0's
gateway_browser_stream::browser_runtime_stream_socket_path builds the
runtime stream socket under std::env::temp_dir().join(
"elastos-browser-streams"), which on macOS resolves under the
per-user /var/folders/<XX>/<YY>/T/... private temp tree. Combined
with tempfile::tempdir()'s nested cache dir and the SHA256-hex
socket file name, the absolute path exceeds Darwin's 104-byte
SUN_LEN limit (Linux's sun_path is 108 bytes and stays under).

This is a v0.3.0-on-Mac platform incompatibility in the test path,
not a Day 3 regression. Captured in DAY_3.md alongside the existing
"deferred to Day 4 / out of scope for the rebase" note so Anders
has the full diagnostic trail.

Co-authored-by: Cursor <cursoragent@cursor.com>
…are logic

Day 3 deferred this. The Mac VZ archive's carrier_bridge.rs predates
v0.3.0's `carrier_invoke` ABI migration and principal-aware
localhost-fs scoping; landing it verbatim on top of v0.3.0 main left
the bridge speaking the OLD `provider_call` request type while the
guest in elastos-guest already speaks the new `carrier_invoke` type.
Linux CI was green only because no end-to-end guest↔host bridge test
exercises a real socketpair; the unit tests cover the bridge logic
in isolation.

This commit takes v0.3.0 main's carrier_bridge.rs as the **logic
base** and re-layers the Mac VZ framing/lifecycle additions on top:

v0.3.0 logic (now present, was missing on the branch):
- `carrier_invoke` request type (replacing `provider_call`)
- `carrier_invoke_dispatch` parser
- `protected_principal_root_carrier_response`
- `principal_root_read_write_uri`, `request_content_bytes`,
  `apply_read_window`, `provider_ok_result`, `provider_error_result`,
  `carrier_error_response`
- `scope_current_user_alias` and `is_unscoped_current_user_alias`
- `provider_scheme_for_carrier_uri`, `wallet_signature_parts_from_uri`
- `is_runtime_control_request` (rejects raw runtime-control surfaces
  including the legacy `provider_call`)
- All 24 v0.3.0 unit tests for the above

Mac VZ surface (preserved on top):
- `CARRIER_MAX_LINE_BYTES` constant + `CarrierFrameError` enum +
  `parse_carrier_line` pub fn for the fuzz harness (Phase 10 Day 4-8)
- `read_line_byte_budgeted` + `drain_to_newline` byte-budgeted line
  reader (Phase 10.5 M1)
- `BridgeContext.on_terminate: Option<Arc<Notify>>` (Phase 4 Day 6)
- `spawn_carrier_bridge_on_stream` socketpair entry point (Phase 3
  Day 4)
- Shared `run_carrier_bridge_loop` extracted from the path-based
  bind/accept flow so both Linux and Mac use the same dispatch loop
- `on_terminate.notify_waiters()` fires on every loop exit (EOF,
  read error, write error, oversized-line teardown, accept failure)

Local validation:
- `cargo check --workspace --tests` clean
- `cargo clippy --workspace --tests -- -D warnings` clean
- `cargo fmt --all -- --check` clean
- `cargo test -p elastos-server --lib carrier_bridge::` 24/24 pass
- `cargo test -p elastos-server --lib supervisor::` 60/60 pass
- `cargo test -p elastos-server --lib` 646 pass / 6 fail / 2 ignored
  — the 6 failures are the same v0.3.0-on-Mac SUN_LEN socket-path
  defects already documented in DAY_3.md; no Day 4 regressions.

Co-authored-by: Cursor <cursoragent@cursor.com>
…base

Day 4 walk-through (sash/local-test-v030 vs archive/sash/local-test)
revealed that the carrier-bridge fuzz harness was the only Mac VZ
surface that didn't make it through the v0.3.0 rebase. Re-staged
verbatim from the archive:

- elastos-server/fuzz/.gitignore
- elastos-server/fuzz/Cargo.toml (its own [workspace] — uses nightly
  for libfuzzer-sys without disturbing the parent stable toolchain)
- elastos-server/fuzz/dict/carrier_bridge_framing.dict
- elastos-server/fuzz/fuzz_targets/carrier_bridge_framing.rs
- elastos-server/fuzz/corpus/carrier_bridge_framing/*.{json,empty,
  spaces,blank-lines,oversized,truncated-utf8,...} (24 seed inputs)

The harness consumes the public surface I preserved on Day 4's
carrier_bridge.rs reconcile: `parse_carrier_line`,
`CarrierFrameError`, `CARRIER_MAX_LINE_BYTES`. Asserts framing
parser never panics, oversized inputs short-circuit with
`LineTooLarge`, every byte-slice yields `Ok` or `Err`.

Operator usage: `cargo +nightly fuzz run carrier_bridge_framing`.

No code changes — fuzz crate is its own workspace, doesn't touch
the main build. Linux/Mac CI behavior unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
…seline

Closes the v0.3.0 rebase. Three things in one commit because they
form the cohesive Day-4 sign-off bundle:

1. `.github/workflows/linux-untouched.yml`: re-baseline
   `VZ_BACKEND_BASELINE` from Day 2's `ded1333` to Day 4's `65f5f05`
   (the carrier-bridge fuzz harness restoration commit, which is the
   rebase's final-state HEAD). Day 3 + Day 4 commits only touch
   elastos-server (NOT protected), so the gate continues to enforce
   "no future commit modifies elastos-crosvm / elastos-runtime /
   elastos-common / elastos-compute beyond what the rebase already
   shipped." Gate tested locally on the rebased branch — all four
   protected paths show 0 diff vs the new baseline.

2. `docs/mac-vz/v030-rebase/DAY_4.md`: sign-off doc with what landed
   in the carrier_bridge.rs reconcile (the inverted strategy — v0.3.0
   logic base + Mac VZ framing/lifecycle on top, in contrast to Day
   3's archive base + v0.3.0 fields), the workspace walk-through
   summary (only the fuzz harness was missing), the local validation
   pattern (cargo check/clippy/fmt clean, unit tests green except
   the same 6 pre-existing Mac SUN_LEN failures), and the carry-
   forward known issues (SUN_LEN, components.json darwin-arm64 gap,
   real-kernel boot tests requiring codesigning).

3. `docs/vz-backend/V030_MESSAGE_DRAFT.md`: refresh the Anders
   handoff message to reflect "rebase complete" rather than the pre-
   rebase "two real conflicts" framing. Three short paragraphs, no
   action requested, evidence-led, points at DAY_4.md for depth.

Co-authored-by: Cursor <cursoragent@cursor.com>
@SashaMIT SashaMIT changed the title [DRAFT — DO NOT REVIEW] Mac VZ branch refresh against v0.3.0 (Day 1/4) [DRAFT — review-ready] Mac VZ branch refresh against v0.3.0 (4/4 complete) May 28, 2026
…ions

The Day-4 status note muted the genuine open questions under "no
action requested." That was wrong — there are decisions only Anders
can make for v0.3.1 (scope, PR #3 landing strategy, the 6 SUN_LEN
failures, components.json darwin-arm64, principal-rooted scoping on
Mac, provider_call deprecation, codesigning in dev loop). Adding a
second paste-ready section to V030_MESSAGE_DRAFT.md with seven
numbered items so Anders can reply per-question at his own pace.

Co-authored-by: Cursor <cursoragent@cursor.com>
@SashaMIT SashaMIT changed the title [DRAFT — review-ready] Mac VZ branch refresh against v0.3.0 (4/4 complete) [PARKED — v0.3.2+ track] Mac VZ branch refresh against v0.3.0 (rebase complete, not for v0.3.1 merge) May 29, 2026
…k (Day 1)

Anders answered all seven open v0.3.1-shape questions. Headline: macOS
stays browser-hosted, native Mac VZ parity gated behind real Apple
Silicon hardware proof, PR #3 parked as a v0.3.2+ branch (not a v0.3.1
merge candidate).

- Add DECISIONS.md with the seven rulings verbatim + per-decision
  follow-up mapping + the merge acceptance bar.
- Decision #5 verify-first finding: the Mac launch path builds
  BridgeContext (principal_id/data_dir) byte-identically to Linux and
  feeds the same run_carrier_bridge_loop dispatch, so principal-rooted
  scoping is already enforced at parity (not flat-rooted). Remaining
  work is a Mac-gated parity test, not an implementation change.
- Link DECISIONS.md from DAY_4.md and the Anders message draft.

Co-authored-by: Cursor <cursoragent@cursor.com>
SashaMIT added a commit that referenced this pull request Jun 17, 2026
…ent audit, action enforcement, hardening pass

Closes the four highest pre-audit findings as fail-closed boundaries, each
with a regression test.

#1 CEK reconstruction integrity (HIGH): bind the reconstructed CEK to a
   published commitment before use and add 3-share cheater-detection on the
   live quorum/threshold open path. A Byzantine node returning a well-formed
   but wrong-valued share now FAILS THE OPEN CLOSED; an uncommitted 2-of-2
   open is refused rather than yielding a silently-wrong key.
   (ddrm-envelope, decrypt/encrypt/key providers, media-authority/producer rails)

#2 Tamper-evident audit + content-open custody (GAP-8): hash-chained
   (seq + prev_hash + record_hash), ed25519-signed records with a crypto-
   agility tag, persisted 0600 signing key, verify_chain, emit() -> Result.
   A content-open event is emitted on the viewer open path; a failed audit
   append fails the open closed (503). ed25519 lives in the trusted core
   (no capsule ML-DSA dependency). Caveat in-code: tamper-evident against
   external editing + non-repudiable, NOT against a live-compromised runtime
   (external anchoring is a deliberate follow-on).
   (elastos-runtime primitives/audit.rs, elastos-server viewer_open/gateway)

#3 Central action-enforcement: validate every provider dispatch against the
   operation's required action via required_action_for(op), not the token's
   self-declared action. Enforced centrally at the trust boundary (bridge +
   HTTP provider proxy); fail-closed Admin default for unmapped ops.
   (provider_resource.rs, carrier_bridge.rs, handlers/provider.rs)

#4 Quick hardening:
   (a) DKMS_AUTHORITY_NODE_SET_ID_B64 mandatory in release — absent it, the
       node refuses to authorize against a caller-declared node-set
       (cross-quorum replay defense); fallback only in test/dev-modes.
   (b) MAX_LINE_BYTES (16 MiB) cap in vsock-proxy: all line reads routed
       through a bounded reader so a newline-less peer can't OOM the bridge.
   (c) gf256_mul made branchless (arithmetic masks, no tables, fixed 8 iters)
       to remove the secret-dependent control-flow timing channel.
   (d) effective_now clamped to the node clock at issuance so a caller can
       only shorten its session window, never extend it past its TTL.

Also: widen check-wci-alignment.sh to exempt manifest-less crates (crypto
libraries / node binaries are not deployable app capsules) and dependency
files; HANDOVER Day 139 checkpoint; incidental cargo-fmt normalization to
keep the workspace fmt-clean.

Verified per-crate: ddrm-envelope 40/40, dkms-authority 24/24, vsock-proxy
3/3; alignment-check green. (These crates are outside the elastos workspace,
so `just verify` does not exercise them.)

Co-authored-by: Cursor <cursoragent@cursor.com>
SashaMIT pushed a commit that referenced this pull request Jun 17, 2026
The glass box can now act, in the fail-safe direction: revoke a capability,
which only ever reduces authority. Grounded in PRINCIPLES (#3, #16, #11) and
the bearer-token model.

- New endpoint elastos://inspect/revoke (mutation). Requires a Write inspect
  capability at System scope (or shell); revokes a token by id via
  CapabilityManager::revoke and emits an inspect.revoke audit event.
- Read vs write separation is enforced by the capability *action* dimension,
  not just the resource: handle_inspect now selects required_action = Write for
  revoke, Read otherwise. A read-only inspect grant can never drive a mutation.
- Self-only scope is rejected for revoke (defense in depth on top of the
  action gate).
- 4 conformance tests: read-only token cannot revoke (the crux — and the
  victim stays valid); shell can revoke (victim then fails validation);
  non-shell Write+System operator can revoke; malformed id -> invalid_token_id.
- docs: revoke endpoint + read/write tier in the security model and contract.
- UI: inspectRevoke bridge stub, intentionally not wired to the token-free read
  view (Principle #16) — a dedicated System admin surface supplies the id.

Verified: cargo test -p elastos-runtime --lib — 278 passed; 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_016ZKy5Cca9RzwDuLb1szdeq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant